Using Crowdsourcing for Evaluation of Translation Quality
نویسندگان
چکیده
In recent years, a wide variety of machine translation services have emerged due to the increase on demand for multilingual communication supporting tools. Machine translation services have an advantage in being low cost, but also have an disadvantage in low translation quality. Therefore, there is a need to evaluate translations in order to predict the quality of machine translation services. In most cases, quality of machine translation services is calculated by professionals using a 5-scale evaluation. However, evaluation by bilingual professionals requires a lot of time and money, which makes this evaluation approach unpractical to be used for all translation services. In this study, we introduce a crowdsourcing translation evaluation method. Crowdsourcing is a novel framework where small tasks are completed by an anonymous crowd by request. The framework has advantages in task performance in terms of time and cost compared to professionals, but the language abilities of crowdsourcing workers cannot be guaranteed. Moreover, users aiming to gain illicit rewards from a task, such as spammers, exist among the crowdsourcing workers. Due to these problems, it is unknown whether the same quality can be obtained from crowdsourcing as from professional evalua-tors. The following problems need to be solved in order to utilize crowdsourcing evaluation of translation. It remains unclear whether crowdsourcing evaluation can replace professional evaluation. The error range between evaluation scores from crowd-sourcing workers and professional contributors needs to be analyzed to determine feasibility of the crowdsourcing approach. Change in evaluation quality by number of crowdsourcing workers In general, a translation evaluation score approaches a constant score as the number of workers increases. There is a need to predict the convergence of evaluation score by crowdsourcing evaluation of translation. ii In this research, an experiment using the Amazon Mechanical Turk (MTurk), the largest platform of crowdsourcing, was conducted. A Chinese to English translation evaluation task of 5-range evaluation was designed, workers were asked to perform the task for a 0.05 dollar compensation for each evaluation. In order to filter out malicious workers, a qualification test was introduced. Secondly, the crowdsourcing evaluation of translation was analyzed regarding the following aspects: Comparison of individual evaluation by crowdsourcing workers and professional contributors for each given translation. The individual evaluation by crowdsourcing workers resulted in the large disparity compared to the evaluations from professionals. At the same time, the effect of increasing the number of workers to number of errors compared to professionals …
منابع مشابه
Crowdsourcing for Evaluating Machine Translation Quality
The recent popularity of machine translation has increased the demand for the evaluation of translations. However, the traditional evaluation approach, manual checking by a bilingual professional, is too expensive and too slow. In this study, we confirm the feasibility of crowdsourcing by analyzing the accuracy of crowdsourcing translation evaluations. We compare crowdsourcing scores to profess...
متن کاملGetting Expert Quality from the Crowd for Machine Translation Evaluation
This paper addresses the manual evaluation of Machine Translation (MT) quality by means of crowdsourcing. To this purpose, we replicated the ranking evaluation of the ArabicEnglish BTEC task proposed at the IWSLT 2010 Workshop by hiring non-experts through the CrowdFlower interface to Amazon’s Mechanical Turk. In particular, we investigated the effectiveness of “gold units” offered by CrowdFlow...
متن کاملCrowdsourcing Translation: Professional Quality from Non-Professionals
Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent, low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically, we solicit redundant translations and edits to them, and automatically select the best output among them. We prop...
متن کاملCrowdsourcing Translation by Leveraging Tournament Selection and Lattice-Based String Alignment
Crowdsourcing translation tasks typically face issues due to poor quality and spam translations. We propose a novel method for generating large multilingual text corpora leveraging Tournament Selection and LatticeBased String Alignment without requiring expert involvement or Gold data. We use crowdsourcing for gathering a set of candidate translations of a given source sentence. A crowd sourced...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013